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ABSTRACT 



Seven undergraduate volunteers studied a^wjritten ■ 
passage on Atomic Structure and then, while answering a set of 24 
multiple-choice, items, talked aloud about the strategies t*ey were 
using for option selection. The tape recordings of their verbal 
responses were analyzed for latency, memory references, and inference 
references. The items testing knowledge requited a shorter time to 
answer, and the verbal reports contained more words and phrases 
associated with memory processes, fewer associated with inference, 
thin did those for the items testing higher-border skills. The results 
suggest the usefulness bf a more complex definition of item * 
difficulty. (Author) 
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; \ {HDHY \'S. I'PTIFIICF. : A PRELIf'.IIi^'^Y STUDY 
OF PnoCCSS-REFfinOICED TEST ITHS*^ 
David v.- Willians, ' James J. Diamond, 

Elizabeth Kreher and Jude Braucher 

Ithaca College University of Pennsylvania , 

Recent interest in performance-based evaluation and cr^terion^-referenced 
tests has recalled attention to the need to explore process as veil as content 
variables in the establishnenit of criteria. 

Recall of specific facts is relatively easy to measure .by means of present 
testing techniques, and it might be suggested that items of this sort are 
the dominant type in multiple-choice based evaluation* While there are several 
extensive systems describing complex levels of cognitive functioning together 
with instructional suggestions (e»g.. Bloom et al, 1956), none has been demon- 
strated to be useful in the construction of multiple-choice items measuring the 
attainment of these levels. Rather, it is generally assumed that the ^tem 
witer*s intent alone is sufficient to produce an item which will suc0essfully 
engage the' student in a given cognitive ftinction. It would be desirable to 
approach the development of instruments which are empirically valid measures 
of given processes. 

Kropp (1956) investigated the relationship between solution process„.,and 
item performance. However, since his items were from a standardized instrument, 
he was not in a position to know the solution process intended by' the item writer. 
Connolly and Wantnan (?96U) also worked with standardized instruments. Little 
and D'Asaro (1965) studied solution processes of a single student on a biology 
examination. However, they were more concerned with test scoring issues than 
with cognitive processes jper se. 

A previous study (Diamond & Williams, 1972) compared the item writer's 
process- intent directly with the student's report of process. I'o consistent 
relation was found between global ratings of written descriptions of item- 
solution processes, and the four Bloom processes of Knowledge (K), Conrorehension 
(C), Application (Ap), and Analysis (An); intended by the item writer. 

Thus the present study sought^to test the hypothesis that "process" 
distinctions would be found between two types of items: those requiring 
responses in terms of the materials read (K) and those reflecting higher order 
skills" (HOS) (i.e., C. Ap. An), involving some transformation and/or recombin- 
ation of the materials presented. Specifically it was hypothesized that K 
items would have shorter response-latencies than HOS items and that there would 
be recognizable semantic and syntactic differences in the verbal reports 
associated with the two item-types. 



•■Paper presented at meetings of the American Educational Research Association. 
Washington, April 1975. This research was supported by a grant from the bpencer 
Foundation . 

^The differences among behavioral, psychometric and neurological definitions 
df "process" are being laid aside for the present purpose. 
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Two nodules each consistinf^ of a passage ("Atomic Structure"; 
"Glaciers") and 2k multiple-choice items were adapted from materials 
developed by Kropp & Stoker (1956). Fron items Judgad by them to involve the 
Bloom skills of Knowled(;e, Comprehension, Application and Analysis, six of each 
type were selected and randomly arranged in booklet form, onfe to a page 
(with the constraint that three of each type occur in each half of the booklet 
and ho more than two items of one type be adjacent). 

Subjects were undergraduate volinrteers who worked individually vith, both 
passages on separate occasions, in a counterbalanced sequence. Each student 
was asked to read the passage as (s)he would if studying for a test (i.e., to 
make notes, underline,- etc.) for as long as (s)he wished. The passage was. 
then removed and (s)he was asked to answer the multiple-choice questions while 
talking about the strategies (s)he was using ("...what you are thinking about... 
the basis for choosing an answer..."). Each was continuously tape-recorded 
while answering the items. An observer was present to whom the student could 
direct these statements who functioned 6nly in the role 6t an attentive listener. 
Both were blind to the specific hypothesis of the study^ - . 

Tapes were blindly' and .independently surveyed and key words and phrases 
associated with "memoiy" and "inference" were identified, and collated. 
" Memory references" consisted of (l) explicit references to the passage ("...it 
said. •.mentions... was on page...") and (2) words whose ordinary meanings relate 
to the reporting of events (".. .recognize. . .remember. .." ). " Inference refer- 
ences" consisted of (l) words and phrases whose ordinary meanings relate to- the 
concept of inference (".. .suppose. ^ .infer. . .figure out...") and (2) sub- 
junctives and logical conjunctions/ suggesting combination, transformation, or 
predication (...would have... therefore... but... if..."). An experimenter, blind 
to the categorization of items, then recorded the number of references of each 
type and choice-latency for each item from the tapes. 

Each of the three measures, response latency, memory and inferences 
references, was examined in each set of items by treatment x subjects analyses 
of variance. Analyses were made in terms of the K vs.HOS classification of 
processes as well as the Bloom categories. These results are provided as 
Table 1. 

Discussion 

The hypothesis that K vs. IIOS items would be distinguished was confirmed. 
Both sets of K items had significantly shorter response latancies than HOS 
items consist ant with the presumably greater complexity of the higher-order- 
skills, (This measure of item Complexity would seem thus to be a useful com- 
pliment to traditional difficulty indices. ) One set of K items showed 
.significantly more memory references than the HOS items consistent with tlie 
hypothesized relation between conceptualizations of the "Knowledge" process-intent 
of the item writer and the "memory reference" of the test-takers. In both sets 
of items, there were significantly more inference references to HOS items than 
to K items showing a relation between items intended to elicit "higher order 
skills" and the subjects* reports of inferential processes. 
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While the memory/inference measures were developed to distinguish between 
K and HOS itcae, they were also examined within each of the four original Bloom 
item^groups. M^aory references yielded no pattern across the two seta of items, 
consistent with the 1972 study's findings. Latencies and inferenc3 refererfces, 
however,* showed three significantly different levels corresponding with K, C/Ap 
and An. (This stiggests a means of empirically developing a taxonomy of 
"process" . ) 

^ese findings confirm the inappropriatness of applying the Bloom taxonomy 
directly to the construction of multiple-choice items. The item-writer's ^ 
intent is denonstrably an insufficient guide to determining the "process"* 
elicited by an item. In addition, the results point to the need for reconsid- ^ 
ering such process descriptions in light of analyses of verbal solution strate-^ 
gies. Further, information about these "extra-cognitive" aspects of multiple- 
choice item performance (e.g., choice-latency, process description and demand 
characteristics) seems useful as an adjunct to the use of traditional structural 
charact (Eristics (e.g., difficulty & discrimination indices) in the construction 
of multiple-choice instruments capable>of assessing process as well as conteiat. 
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